Nature Protocols
○ Springer Science and Business Media LLC
Preprints posted in the last 30 days, ranked by how well they match Nature Protocols's content profile, based on 30 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.
O'Roberts, E.; Panshikar, P. R.; Li-Wang, X.; Avenel, C.; Verron, Q.; Coulier, E.; Bienko, M.; Stadler, C.
Show abstract
Different omics types such as genomics and proteomics all contribute to deciphering biology. Applying these omics approaches in a spatial context helps reveal biology in situ at a single cell level. Here we present a protocol for the combined multiplexed detection of targeted genes using DNA FISH, and proteins using multiplexed immunofluorescence. The protocol is integrated on the commercial PhenoCycler platform and generates one single dataset with gene and protein readout at a single cell level in large tissue sections, allowing for a throughput of thousands to millions of cells. The workflow can be used for characterising malignant cells in large tumor areas based on genetic aberrations, while deciphering the cellular landscape and microenvironment from multiplexed protein detection using immunofluorescence.
Bhattarai, A.; Smith, J.; Abdelgaffar, H.; Carpenter, R.; Mishra, S.; Fuentes, J. L. J.; Shirsekar, G.
Show abstract
This protocol details the extraction of high-molecular-weight genomic DNA from grapevine tissues (wild and cultivated Vitis spp., including pathogen-infected samples) and the subsequent preparation of Illumina(R) whole-genome sequencing libraries using bead-bound Tn5 transposase. It is designed to overcome challenges from polyphenolic compounds and secondary metabolites in wild plants, providing a cost-effective workflow for large-scale population genomics. It includes recipes for buffers, incubation times, critical notes, and troubleshooting tips to maximize yield and library quality. Although designed for the grapevine DNA, this protocol is potentially applicable to other similar wild plant species HighlightsO_LIOptimized CTAB-PTB DNA extraction protocol for field-collected wild plant tissues. C_LIO_LIEffective removal of polyphenols and secondary metabolites associated with DNA using PTB. C_LIO_LICost-effective Illumina DNA Prep library preparation using bead-bound Tn5 transposase (Tagmentation). C_LIO_LIScalable workflow suitable for large-scale population genomics in Vitis species. C_LIO_LIValidated method for high-molecular-weight DNA and high-quality sequencing data. C_LI Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=195 SRC="FIGDIR/small/713680v1_ufig1.gif" ALT="Figure 1"> View larger version (31K): org.highwire.dtl.DTLVardef@b637d4org.highwire.dtl.DTLVardef@10b563aorg.highwire.dtl.DTLVardef@14a32caorg.highwire.dtl.DTLVardef@4c9577_HPS_FORMAT_FIGEXP M_FIG C_FIG
Xu, X.; Caggiano, M. P.; Wells, M. L.; Sun, G.; Lim, S. M.; Multari, D. H.; Blundell, S. A.; Hartel, N.; Viner, R.; Polo, J. M.; Schittenhelm, R.; de Marco, A.
Show abstract
Transcriptomic and proteomic measurements from the same single cell provide complementary information that cannot be inferred from either modality alone, yet methods for the parallel recovery of both analyte classes from a single-cell lysate remain limited. Here, we describe a workflow in which individual cells are isolated by automated dispensing into a minimal, MS-compatible lysis volume, followed by sequential mRNA capture and protein supernatant recovery, prior to independent downstream processing. The method is compatible with standard library preparation and data-independent acquisition proteomics pipelines and requires no dedicated instrumentation beyond a single-cell dispensing platform. We evaluated workflow performance on 67 single cells across 3 iBlastoids. Transcriptomic sequencing detected a median of 5375 genes per cell, and proteomic analysis identified a median of 2123 protein groups per cell across two mass spectrometry platforms. Compared with a standalone single-cell proteomics protocol, incorporating the mRNA extraction step reduced median proteomic depth by approximately 11% (median 1,965 vs. 2,204 protein groups per cell), while mean percell identification remained comparable across workflows (1,790 vs. 1,775 protein groups per cell). Direct comparison of paired transcript and protein abundance yielded a median Spearman correlation of {rho} {approx} 0.38; after correction for detection depth, the partial correlation was 0.067.
Greenwood, M. E.; Austin, S.; Murciano-Martinez, P.; Hollywood, K. A.; Machidon, M.; Spiess, R.; Berrington, J.; Flitsch, S.; Barran, P.; Stewart, C. J.
Show abstract
Human milk contains structurally diverse glycans with key roles in shaping infant development, yet analytical constraints limit characterisation from low-volume samples. Glycosaminoglycans (GAGs), including chondroitin sulphate (CS), are understudied due to existing protocols requiring sample volumes of at least 5 mL and lengthy extraction steps prior to instrumental analysis. This study establishes a workflow for quantifying CS disaccharides from 25 {micro}L of human milk, enabling analysis of samples previously inaccessible to GAG profiling, such as those collected as salvage samples from neonatal intensive care units. For CS quantification, the CS is first enzymatically depolymerised using chondroitinase ABC to release repeating disaccharide units. Matrix complexity is reduced via two rounds of acetonitrile-based protein and lipid precipitation. Disaccharides are separated by hydrophilic interaction liquid chromatography and detected using a Triple Quadrupole Mass Spectrometer, providing robust sensitivity for all CS disaccharides. Method development and validation were performed using pooled mature human milk from term infants. This workflow facilitates detection of all CS disaccharides, with low but reproducible recoveries for total CS. Low- and high-level spike recoveries were 41.3% (RSDr 7.5%, RSDiR 15.9%) and 43.7% (RSDr 24.4%, RSDiR 27.9%), respectively. Despite modest absolute accuracy, precision remained sufficient to make relative comparison of CS concentrations between samples. This method expands the analytical toolkit for human milk glycomics, enabling same day preparation and CS profiling from sample volumes that are 200 times smaller than prior work, supporting future investigations into GAG-mediated functions in early life. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=134 SRC="FIGDIR/small/723732v1_ufig1.gif" ALT="Figure 1"> View larger version (31K): org.highwire.dtl.DTLVardef@176dffborg.highwire.dtl.DTLVardef@16ae4ccorg.highwire.dtl.DTLVardef@d333c2org.highwire.dtl.DTLVardef@1eb3216_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOGraphical abstractC_FLOATNO Schematic of sample preparation protocol 25 L of human milk is combined with lyase enzymes and TRIS buffer containing the internal standard prior to incubation. Samples then undergo multiple rounds of centrifugation and refrigeration before analysis via LC-MS/MS. Made using BioRender.com. Glycan nomenclature following Varki et al., 2015. C_FIG
Staller, S. A.; Valentine, V.; Burden, S.
Show abstract
SummarySequential multiplexed fluorescence in situ hybridization (FISH) enables spatially resolved molecular profiling in cell monolayers, but analyzing puncta colocalization across three-dimensional (3D) datasets remains a labor-intensive bottleneck. zFISHer is an open-source application built on the napari viewer that provides complete automation of sequential FISH image processing in conjunction with interactive user-curation tools. zFISHer provides end-to-end analysis of paired FISH datasets, encompassing nuclear segmentation, automated puncta detection on unaligned z-stacks, multi-round image registration via translation-constrained RANSAC with optional B-spline deformable warping, precise transformation of puncta coordinates into aligned space, consensus nuclei generation, interactive editing with real-time collision detection, and pairwise and tri-channel colocalization analysis with statistics. This includes a "Fishing Hook" raycasting algorithm that enables users to locate puncta at their true 3D centroids by identifying intensity maxima along the camera ray, eliminating manual z-slice navigation, complemented by a sub-voxel volume optimization. The included batch processing mode enables high-throughput unattended analysis of multiple experimental datasets. Availability and ImplementationzFISHer is open source under the MIT license, freely available on GitHub: https://github.com/stjude/zFISHer. The example dataset (deconvolved ND2 image stacks) is archived on Zenodo at https://doi.org/10.5281/zenodo.20288536. zFISHer is developed in Python utilizing the napari viewer for the interface. Documentation and expected test outputs for the sample dataset are available on the GitHub: https://github.com/stjude/zFISHer. To report an issue using zFISHer or contributing to it, please file an issue in the GitHub repository: https://github.com/stjude/zFISHer/issues. ContactSeth.Staller@STJUDE.ORG Supplementary InformationSupplementary data are available online.
Dönmez, A.; Nosov, O.; Heck, K.; Mosig, A.; Fritsche, E.; Koch, K.
Show abstract
MotivationThe ToxCast database is a valuable resource for computational toxicology and new approach methodologies (NAMs), but the approximately 100 GB MySQL distribution is difficult to use for portable local analysis and cross-domain evidence mining. Many practical questions concern chemicals, in vitro bioactivity, in vivo toxicological evidence, and exposure-relevant product-use context rather than raw database keys. ResultsWe present ToxCastLite, a portable semantic evidence-access system that combines assay-scoped SQLite databases with a compact RDF layer for GraphDB-based querying. The system streams large ToxCast/invitrodb MySQL dumps into curated SQLite profiles, reducing the footprint to approximately 3 GB for focused use cases such as developmental neurotoxicity. Dense numerical evidence, including concentration-response rows, remains in SQLite, while the RDF projection exposes linked semantic entities such as chemicals, assays, endpoints, model results, potency parameters (AC50), and MC6 quality flags. We further extend the graph with CPDat v4.0 product-use and functional-use evidence and ToxRefDB v3.0 in vivo toxicity evidence, including processed studies, point-of-departure records, effect summaries, and observation summaries. These layers are linked through DSSTox Substance Identifiers, enabling integrated queries across NAM bioactivity, curated animal-study evidence, and exposure/use context. A Streamlit prototype supports exploration through a locally deployed LLM that translates natural-language questions into SPARQL, grounded by a versioned RDF schema to reduce hallucination risk. Case studies in developmental neurotoxicity demonstrate how ToxCastLite identifies concordance between high-confidence in vitro DNT activity and positive in vivo apical evidence, detects in vitro DNT activity beyond available DNT-specific in vivo evidence, and prioritizes chemicals where NAM signals, ToxRefDB evidence, and CPDat product-use context intersect. For selected results, users can drill down from the semantic graph to the underlying SQLite records and retrieve concentration-response curves for expert inspection without manually writing SQL or SPARQL. AvailabilityProject website at toxcast-lite.github.io/. Contactarif.doenmez@iuf-duesseldorf.de
Fenn, A.; Hueckelhoven, R.; Kamal, N.
Show abstract
Dual-organism RNA sequencing (RNA-seq) experiments, in which the transcriptomes of a host and a microbe are sequenced simultaneously, are increasingly used to study plant-microbe interactions. A central analytical goal is identifying effector proteins and their host targets through gene co-expression. Weighted Gene Co-expression Network Analysis (WGCNA) is the dominant tool for gene co-expression analyses, yet its ability to recover interaction-interface genes from a merged dual-organism matrix has not been systematically characterised. Here we present a simulation framework using real gene models from Hordeum vulgare (barley) and Blumeria graminis f. sp. Hordei M.Liu & Hambl (powdery mildew) to evaluate single-network WGCNA across a gradient of plant-to-fungal library size ratios (1:1-20:1), three levels of co-expression signal strength, and three WGCNA network construction types (signed, unsigned, signed hybrid). We embed 20 model effector genes (bridge genes) driven by a mixed host-pathogen eigengene and evaluate recovery using four metrics aligned with the biological objective: cross-species hub rank, top-decile hub enrichment, bridge gene detection rate, and bridge co-separation (the fraction of effector-target pairs co-assigned to the same detected module). Across 225 simulation runs (15 conditions x 5 replicates x 3 network types), bridge genes are robustly identifiable as cross-species connectivity hubs (mean rank 0.92 versus 0.50 for module genes) but co-assignment of effector-target pairs to the same module fails in 41% of runs due to scale-free topology collapse. Signal strength (2 = 0.12) and library ratio (2 = 0.22) are the primary determinants of co-separation, while network type choice accounts for less than 2%. A read-depth bias systematically inflates pathogen gene hub ranks relative to host genes at high ratios. These results establish that the method can identify effector candidates as cross-species hubs under a broad range of conditions, but reliable co-assignment requires adequate pathogen read depth and strong co-expression signal--properties that experimental design, not analytical parameterisation, must provide.
Xenes, D.; Kitchell, L. M.; Rivlin, P. K.; Martinez, H.; Rose, V.; Bishop, C.; Brodsky, R.; Celii, B.; Ellis-Joyce, J.; Luna, D.; Norman-Tenazas, R.; Ramsden, D.; Romero, K.; Villafane-Delgado, M.; Collman, F.; Gray-Roncal, W.; Reimer, J.; Wester, B.
Show abstract
Connectomic reconstruction from large image volumes produces segmentation and synaptic-assignment errors that must be resolved to support downstream analyses. As datasets have grown larger and teams more distributed, proofreading has become a critical operational bottleneck. Workflows for proofreading and error correction have not scaled commensurately with connectomic data production and may not accommodate heterogeneous proofreader expertise and machine-generated candidate edits. New tools are therefore needed to organize, prioritize, and coordinate proofreading at volume scale. Here we present NeuVue, a task-management and prioritization framework that operationalizes proofreading through atomic, auditable tasks for individual and team review, multistage routing across proofreader cohorts, performance and volume-state tracking, and integration with community annotation, visualization, and analysis services. We report the use of NeuVue across two volumetric datasets, supporting scalable proofreading by over forty proofreaders and producing over fifty thousand edits. NeuVue provides a reproducible human-in-the-loop framework for generating, validating, and maintaining large connectomic datasets.
Kushnareva, A.; Tupikina, D.; Almessady, H.; McHardy, A.; Gurevich, A.
Show abstract
SummaryBiosynthetic gene clusters (BGCs) encode microbial natural products, many of which have important ecological and biomedical roles. Genome mining tools enable large-scale BGC prediction, but their outputs differ substantially, complicating comparison and interpretation. We present BGC-QUAST, a framework for evaluating and comparing BGC predictions across three analysis modes: comparison across samples, assessment of BGC recovery in draft assemblies relative to reference genomes, and comparison of predictions from different tools using overlap analysis. BGC-QUAST provides standardized metrics, interactive visualizations, and integrated outputs for joint inspection of predictions, enabling the comprehensive comparison of genome mining results and facilitating sample prioritisation based on biosynthetic potential. Availability and implementationBGC-QUAST is publicly available at https://github.com/gurevichlab/bgc-quast
Ali, A.
Show abstract
We developed plant (Parallel Annotation of Transcriptomes), a de novo method that can potentially compare RNA-seq data of any two species without a reference genome. plant is conceptually similar to chromatography. In the same way a complex mixture is filtered to isolate its individual components, we applied a computational method to identify, annotate, and quantify components across transcriptomes. The comparison points are universal protein domain annotations rather than species-specific genes, as would be the case for a differential gene expression analysis. We looked at several Selaginella species via the 1000 Plant transcriptomes initiative (1KP) where RNA-seq data for various plant species have been made publicly available. The raw reads were assembled via Trinity. The assembled transcripts were then searched against the Pfam protein domain database via InterProScan. The assembled transcripts were also quantified via kallisto. By merging these two aspects, we were able to see how often a particular protein domain - a predicted protein structure - is expressed. These quantified annotations of protein domains are comparable across species, assuming a relatively short evolutionary distance. We were also able to identify the presence of species-specific protein domains and trace each annotation back to the gene. A bubble plot was created to visualize the distributions of Pfam annotations across species as well as GO terms.
Zander, S.; Zhou, X.-R.; Kranz, A.; Dumschott, K.; Rocca-Serra, P.; Weil, H. L.; Tschoepke, M.; Muehlhaus, T.; Von Suchodoletz, D.; Usadel, B.
Show abstract
Electronic laboratory notebooks (ELNs) are widely used in the life sciences, but their notebook format limits machine-readability and FAIR compliance. Consequently, researchers often spend significant manual effort restructuring ELN records into publication-ready outputs. We present elab2ARC, a browser-based workspace that automates the conversion of open-source eLabFTW records into Annotated Research Contexts (ARCs)-- version-controlled, ISA-compliant research objects. Using the eLabFTW API, elab2ARC retrieves administrative metadata, protocols, and attachments, reorganising them into ISA-compliant tables and linked datasets. All processing occurs client-side, ensuring user data control before submission to the PLANTdataHUB repository. An optional LLM-assisted workflow extracts structured metadata from free-text protocols, providing editable drafts while preserving human oversight. Designed for use at project completion, elab2ARC reuses existing ELN documentation without disrupting daily laboratory practice. It offers a practical route to FAIR-aligned sharing, publication, and long-term archiving of life-science experimental records. Availability and implementationelab2ARC is freely accessible at https://nfdi4plants.org/elab2arc/. The source code is available at https://github.com/nfdi4plants/elab2arc under a GPL-3.0 license. Supplementary informationSupplementary data are available online.
Songara, D.; Ghosh, H. S.
Show abstract
CaMKII promoter is widely used to label and manipulate hippocampal pyramidal neurons via transgenic mouse lines or viral approaches. While it targets most excitatory neurons, a small subset remains unlabeled and often overlooked. We present an AAV-based strategy combined with CaMKII-driven Cre expression to access and study this remaining population. Furthermore, we provide a detailed protocol for in-house AAV production, targeted stereotaxic delivery, and functional validation of targeted neurons through slice electrophysiology and behavior. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=194 HEIGHT=200 SRC="FIGDIR/small/723440v1_ufig1.gif" ALT="Figure 1"> View larger version (50K): org.highwire.dtl.DTLVardef@3a31ccorg.highwire.dtl.DTLVardef@9b7e90org.highwire.dtl.DTLVardef@92297borg.highwire.dtl.DTLVardef@1e159eb_HPS_FORMAT_FIGEXP M_FIG C_FIG
Choudhury, A.; Kitak, T.; Carrillo, B.; Busch, P.; Emons, M.; Gunz, S.; Koderman, M.; Luo, S.; Mallona, I.; Meara, A.; Wissel, D.; Robinson, M. D.
Show abstract
In the past few years, we have seen a veritable surge in single-cell (e.g., RNA sequencing) techniques and datasets, enabling increasingly detailed characterization of cellular heterogeneity across tissues and conditions. This surge in single-cell techniques has been complemented by a large number of analysis frameworks and pipelines, and a large parameter space and researcher degrees of freedom to use them. Many neutral benchmarks have been presented for various computational tasks, but most make design decisions that render them incompatible with each other, e.g., different datasets and metrics, or parameter sets used. In this work, we showcase a recently developed framework, Omnibenchmark, to build reproducible, extensible and standardized method comparisons. This not only facilitates the broad investigation of pipelines used in single-cell data analysis, but also highlights how the process of building benchmarks can be streamlined and unified. We do this as an initial proof-of-principle for an arms-length benchmark that evaluates five single-cell RNA sequencing pipelines (filtering to normalization to dimensionality reduction to clustering) on three datasets. This standardization enables benchmarks to be easily extended in several directions, including broader parameter sweeps, comparisons across software versions and architectures, isolation of pipeline steps, and integration of additional pipelines, datasets, and metrics.
Casals-Franch, R.; Nonell, L.; Villa-Freixa, J.; Lopez Garcia de Lomana, A.
Show abstract
Reconstructing dynamic immune cell state transitions from single-cell transcriptomic data requires coordinated analytical strategies that capture both phenotypic progression and underlying regulatory programs. This protocol describes a step-by-step computational workflow for analyzing human tumor-infiltrating T cells using the sequential application of dimensionality reduction, pseudotime trajectory inference, regulon activity analysis, and transcription factor-transcription factor network reconstruction. The workflow outlines data preprocessing and quality control, trajectory rooting and parameter selection, branch-specific differential analysis, and the integration of regulon inference to contextualize transcriptional programs along inferred trajectories. Regulon-based TF-TF network reconstruction is used as a downstream interpretive layer to identify regulatory modules associated with distinct cell-state transitions. Publicly available at GitHub repository https://github.com/rogercasalsfr/immuno-trajectory-grn-integrative-workflow, this protocol emphasizes practical considerations including parameter sensitivity, trajectory robustness, and consistency between phenotypic and regulatory outputs. The protocol supports reproducible analysis and interpretation of immune cell dynamics in human tumor microenvironment studies using single-cell RNA sequencing data.
Mahar, N. S.; Chouhan, K.; Gupta, I.
Show abstract
Real-time taxonomic classification of nanopore amplicon sequencing data enables rapid insights into microbial communities, with applications in clinical diagnostics, environmental monitoring, and outbreak surveillance. However, bridging the gap between long-read data and interpretable results often requires specialised bioinformatics expertise. There remains a need for integrated, user-friendly software that combines live data acquisition with downstream microbiome analysis. Here we present NANOTAXI, a fully automated Shiny-based GUI for the classification of barcoded 16S rRNA gene sequences generated by Oxford Nanopore sequencing. The platform supports four taxonomic classifiers, integrated with five reference databases, enabling flexible selection of classification strategies based on user requirements and available computational resources. In addition to real-time monitoring, NANOTAXI performs cohort-level analyses, including alpha and beta diversity, ordination, differential abundance testing, and functional inference using PICRUSt2. Validation using barcoded synthetic communities comprising pooled genomic DNA from clinically relevant bacterial species and the ZymoBIOMICS mock community demonstrated that NANOTAXI generated biologically coherent taxonomic and functional profiles. Benchmarking revealed clear trade-offs between computational performance and taxonomic specificity. Emu provided the lowest observed species-level false-positive rate, whereas Kraken2 offered the fastest classification and enabled continuous near-real-time monitoring across all tested databases. NANOTAXI is open source and freely available at https://github.com/Nirmal2310/NANOTAXI under the GPL version 3 license.
Kapoor, B.; Cregger, M. A.; Ranjan, P.
Show abstract
MotivationAmplicon sequencing of 16S rRNA and internal transcribed spacer (ITS) gene regions is the most widely used approach for characterizing bacterial and fungal communities, respectively. The DADA2 pipeline has become a standard for inferring amplicon sequence variants (ASVs), offering single-nucleotide resolution over traditional OTU clustering. However, executing the full DADA2 workflow requires proficiency in R programming and manual coordination of multiple sequential steps, presenting a substantial barrier for researchers in clinical, environmental, and agricultural sciences who lack computational training. ResultsWe present RAPID (R-based Amplicon Pipeline for Interactive DADA2), a pair of R/Shiny applications providing complete graphical user interfaces for 16S rRNA and ITS amplicon sequence analysis. The 16S application implements a 10-step guided workflow from raw paired-end FASTQ files through quality filtering, error learning, dereplication, paired-read merging, chimera removal, taxonomy assignment (SILVA), phyloseq construction with data transformation (rarefaction, relative abundance, or CLR), interactive visualization (rarefaction curves, alpha diversity, NMDS, PCoA, taxonomic abundance), PERMANOVA, and ANCOM-BC2 differential abundance analysis. The ITS application extends this to an 11-step workflow, adding an automated primer removal step using cutadapt with support for multiple primers and length-variable amplicons, and uses the UNITE database for fungal taxonomy. Both applications feature asynchronous background processing, session persistence, real-time progress monitoring, publication-ready figure export, and comprehensive result downloads. AvailabilityRAPID is freely available at https://github.com/beantkapoor786/RAPID. Both applications can be installed locally on any system with R (version 4.0 or higher) and run as local web applications accessible through a standard browser.
Guo, J.
Show abstract
The rapid growth of molecular foundation models and large language models has encouraged a scale centred view of AI in drug discovery, in which larger pretrained models are expected to supersede compact cheminformatics models and graph neural networks (GNNs) trained for individual tasks. We test this assumption across 26 endpoints for molecular properties, toxicity, safety liabilities and biological activity, grouped into ADME, toxicity and bioactivity classes. The benchmark contains 78 endpoint and split entries spanning random, Murcko scaffold and structure separated 5-fold CV. Ordered from easiest to hardest, these splits approximate retrospective evaluation on a closed library, scaffold expansion in hit to lead, and library expansion on novel chemotypes. Each entry includes ML, GNN, pretrained molecular sequence and LLM based SAR families. Across 156 fold mean comparisons, classical ML such as RF(ECFP4) and ExtraTrees(RDKit) win 116, GNNs such as GIN and Ligandformer win 25, pretrained sequence models such as MoLFormer and ChemBERTa2 win 12, and LLM based SAR baselines win three. ML dominates random split interpolation but loses part of this advantage under harder splits; GNN and sequence models also decline but gain relative ground, whereas LLM based SAR is weaker in absolute terms yet less sensitive to the split axis. Paired bootstrap analyses support family level trends more strongly than individual model rankings. SAR knowledge derived from training folds improves many GPT5.5-SAR and Opus4.7-SAR metrics but does not make rule based reasoning a universal substitute for supervised predictors. Compact specialized models remain highly effective for molecular property and activity prediction. Larger models add value for SAR interpretation and reasoning in low data settings, but predictive performance depends on the fit among model, task and validation scenario, not on scale alone.
Taouk, M. L.; Ingle, D. J.; Wick, R. R.
Show abstract
BackgroundOxford Nanopore Technologies (ONT) sequencing is increasingly used for whole-genome sequencing (WGS) across a wide range of applications. However, the platform has evolved rapidly through updates to flow cell chemistry and basecalling algorithms, altering the characteristics of the resulting sequencing data. Read simulators provide synthetic datasets with known ground truth, enabling controlled development and evaluation of methods. However, many existing simulators were developed for earlier versions of ONT sequencing or use generic long-read assumptions, and their realism for contemporary ONT data is unclear. ResultsWe benchmarked six ONT-compatible read simulators (Badread, LongISLND, lrsim, NanoSim, PBSIM3 and SimLoRD) using a microbial genome reference and ONT R10.4.1 reads as the empirical standard. Each tool was configured to maximise realism, including training on empirical reads when supported. We compared simulated and real datasets with respect to read length, read accuracy, FASTQ quality scores and sequence error profiles. No simulator reproduced all metrics of the real data well. PBSIM3 most closely reproduced read length, read accuracy and FASTQ quality scores, making it a strong simulator for broad read-level realism. However, it did not capture important features of the real error profile, including context-dependent substitution rates and homopolymer-length errors. Badread and LongISLND better reproduced some aspects of the error profile, but showed other departures from the real data. ConclusionPBSIM3 is a good general-purpose choice for many ONT WGS simulation tasks because it reproduced several key read-level properties well. However, Badread or LongISLND may be preferable for applications where error structure is more important. No evaluated tool was realistic across all tested metrics, highlighting a gap for improved long-read simulators.
Gillman, R.; Dwyer, B. J.; Pasic, S.; Shirolkar, G. D.; Main, N.; The Liver Cancer Collaborative, ; Field, M. A.; Schmitz, U.; Hebbard, L.
Show abstract
Background and AimsA major goal of personalised liver oncology is the ability to make targeted predictions about cancer-specific toxicity, however there are limited methods available. To address this, we validated the performance of our bioinformatics framework, TARGET-SL, through ex vivo drug screening. MethodsUsing TARGET-SL we predicted gain of function (GOF), loss of function (LOF) and synthetic lethal (SL) genetic events, and corresponding drug candidates. We validated drug predictions across hepatocellular carcinoma (HCC) cell lines, and a cohort of HCC and cholangiocarcinoma (CCA) patient-derived organoids (PDOs). ResultsFor HCC cells and PDOs we found 37.5% and 25% of the respective selected compounds induced unique target-specific growth inhibition based on genetic biomarkers, suggesting novel biomarker-driven drug sensitivities. ConclusionsOur analyses demonstrate TARGET-SLs potential to enhance personalized drug screening for liver cancer, by focusing on genetically informed targets. This will reduce experimental costs and accelerate the pace of therapeutic discovery. Impact and ImplicationsPrimary liver cancer (PLC) is a cancer with poor prognosis, and current therapies increase survival only for a minority of patients. Through the application of TARGET-SL we can predict, for each patient, the essential genes and corresponding small molecule inhibitors. These data support further investigation in larger patient cohorts and offer the possibility to specify new small molecule inhibitors and to repurpose current drugs for PLC treatment. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=81 SRC="FIGDIR/small/725819v1_ufig1.gif" ALT="Figure 1"> View larger version (28K): org.highwire.dtl.DTLVardef@10cb252org.highwire.dtl.DTLVardef@8f3049org.highwire.dtl.DTLVardef@ab4467org.highwire.dtl.DTLVardef@17f9d3_HPS_FORMAT_FIGEXP M_FIG C_FIG HighlightsO_LITARGET-SL can predict gene and drug sensitivities for cell lines and patient-derived organoids C_LIO_LIThis may reduce drug screening costs and accelerate the pace of therapeutic discovery. C_LIO_LITARGET-SL may assist in the repurposing of current drugs and their rapid translation for primary liver cancer C_LIO_LITARGET-SL is tumour-type agnostic, and therefore may have application in other cancers with poor prognosis C_LI
Lorimer, I.; Lui, M.; Makinson, O. J.; Walsh, M. L.; Matthews, T. J.; Woulfe, J.; Ardolino, M.
Show abstract
BackgroundGlioblastoma is an aggressive and incurable brain tumor. Clinical trials of immune checkpoint inhibitors showed no clinical benefit in glioblastoma when given after surgery. However, a clinical trial in which PD1 inhibition was given prior to second surgery did show pharmacodynamic evidence for activity. This suggests the possibility that immune checkpoint inhibitors may be more effective in a setting where large tumors are present. Here we have studied immune responses to large tumors in an autochthonous mouse model of glioblastoma. MethodsGlioblastoma was induced by transfection with oncogenic plasmids injected directly into the lateral ventricle of neonatal mice. Immune responses were assessed using a combination of spectral flow cytometry and immunohistochemistry. ResultsThere was a marked immune response to large tumors, with significant increases in CD4 T cells and dendritic cells. T cell changes occurred primarily at leptomeningeal/perivascular border sites. A large proportion of CD4 T cells expressed PD1 and half of these were regulatory T cells. NK cells were also increased in mice with large tumors, but were predominantly in immature states. The mouse model accurately recapitulates the formation of palisading necroses. These contain apoptotic cells and avidly recruit myeloid cells that are induced to express large amounts of TGF{beta}. ConclusionsLarge glioblastoma tumors generate a border site population of PD1 positive T cells that may explain the pharmacodynamic response in neoadjuvant trials, and a palisading necrosis-driven immunosuppressive mechanism that may explain why responses are insufficient to provide a significant clinical benefit. KEY POINTSThe SB mouse model accurately recapitulates immune features of human glioblastoma Large tumors induce a significant border site immune response Palisading necroses in large tumors counter this with a strong immunosuppressive response IMPORTANCE OF STUDYImmune checkpoint inhibitors have not shown efficacy in glioblastoma when used post-surgery, but do show pharmacodynamic activity when used in patients prior to second surgery (i.e. neoadjuvant). This suggest the possibility that immune checkpoint inhibition is more effective when large tumors are present. Using a clinically-relevant autochthonous mouse model, we show here that large tumors induce an immune response that is evident in leptomeningeal border sites. Large tumors in this mouse model also generate palisading necroses, a well-known diagnostic feature in glioblastoma tumors. These palisading necroses generate large amounts of TGF{beta}, providing a mechanism by which large tumors can suppress border site immune responses. This further supports the concept that palisading necroses are drivers of glioblastoma malignancy and suggests novel strategies to enhance responses to immune checkpoint inhibition in this cancer.